Today’s Agenda

  1. Go over the syllabus

  2. Introduction to R and RStudio

  3. Installing R and RStudio

  4. Basic R syntax and programming

  5. Hands-on practice


Go over the syllabus

Let us switch to Canvas where a copy of the course syllabus is located.

Class Format

Starting Lecture 2, the class format will generally follow the following format:


Introduction to R and RStudio

What is R?


Figure 1. A screenshot of how R looks like in MacOS. Note: You will actually never work with R directly. You will work with R using RStudio.

Figure 1. A screenshot of how R looks like in MacOS. Note: You will actually never work with R directly. You will work with R using RStudio.


Introduction to R and RStudio

Why learn R?


Introduction to R and RStudio

What is RStudio?

[1] IDEs are tools designed to increase programmer productivity by combining common activities of writing software into a single application: editing source code, building executables, and debugging.


Figure 2. A screenshot of how RStudio looks like in MacOS.

Figure 2. A screenshot of how RStudio looks like in MacOS.


Why learn both R and RStudio?

Both tools are widely used by scientists, academics, data analysts, and data scientists.

According to Glassdoor (as of June 6, 2024):

In my old team at the United States Dept of Agriculture, data analysts and data scientists are currently making $117,962 - $153,354 in 2025.

In my current team at another federal agency, data scientists are currently making $139,395 - $181,216 in 2025.


My personal experience with R

Some examples of past work that leveraged both R and RStudio


Installing R and RStudio

Let us switch to Canvas where a copy of installation instructions is located.

We will spend up to 20 mins to ensure both R and RStudio are installed into your computer.


RStudio Basics

Warning: This class is an applied data science class. You will get a lot of practice. However, today’s class is full of definitions or terminologies that you will gain familiarity with throughout the semester. You don’t necessarily need to memorize most of the forthcoming terminologies. Although you will have some practice today, I see Lecture 2 next week as the real first class where you actually code properly.


Today’s penultimate slide will summarize key takeaways once we’ve done some practice after today’s exercise.


RStudio Basics

There are four ‘panes’ or windows in RStudio that we generally use. After your immediate installation, you may only see three (more on this later). But once you start writing and saving R scripts, you will regularly interact with all four panes.

These panes are:


class: middle

Figure 3. This is called the Environment Pane from RStudio which allows users to track which variables or data have been saved into the R environment. More on this later.

Figure 3. This is called the Environment Pane from RStudio which allows users to track which variables or data have been saved into the R environment. More on this later.


Figure 4. This is called the Console Pane from RStudio (Linux Version) which allows users to type in and execute scripts.

Figure 4. This is called the Console Pane from RStudio (Linux Version) which allows users to type in and execute scripts.


Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane.

Figure 5. Here is an example of a simple script for addition. Type ‘4+2’ then press Enter/return in the Console Pane.


Brief Detour: Basic Arithmetic Operators in R

Since we are discussing running basic simple R scripts:

Figure 6. Five basic arithmetic operators you can perform in R.

Figure 6. Five basic arithmetic operators you can perform in R.

Note: + is addition; - is substraction; * is multiplication; / is division; and ** or ^ is exponentiation.


Console Pane

Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane.

Figure 5. Here is an example of a simple script for addition. Type ‘4+2’ then press Enter/return in the Console Pane.

Throughout this course, we will rarely type in and execute/run scripts from the console pane. Generally, you want to save scripts you generate and execute within an R file (more on this later).

Today is one of those exemptions. Generally, we will run scripts in the console pane to install packages. For now, you may think of packages as a collection of tools to increase productivity and do specific tasks (i.e., certain packages can help you create maps). Within the next few slides, we will install packages that you will need for the first four weeks of the course.


The tidyverse


Install the tidyverse packages

Type in then execute (by pressing enter/return) this code within your console pane: install.packages(“tidyverse”)

You only need to install this once. If you’ve used R previously, it is possible you might have it already.

Once you have tidyverse installed, you need to load the package each time you start a new R session.


More generally, you need to run this script for installing packages: install.packages(“[fill in package name]”)


Loading packages in R

To load a package, you need to run/execute this template script: library([fill in package name])

Note: No quotation symbols when loading a package. Again, you need to load the package each time you start a new R session. After loading packages in R, you are then allowed to use programming ‘tools’ included within each package to increase your productivity and perform highly specialized tasks.

This may seem trivial for now but you will get a lot of practice throughout the course. This is how you load the tidyverse package into R:

library(tidyverse)


Files Pane

Figure 9. A screenshot of the Files pane. We will keep revisiting the Files pane throughout the semester.

Figure 9. A screenshot of the Files pane. We will keep revisiting the Files pane throughout the semester.


Source Pane: Missing

Figure 10. Three panes you see upon opening RStudio. Initially, it excludes a fourth pane called source pane.

Figure 10. Three panes you see upon opening RStudio. Initially, it excludes a fourth pane called source pane.


Source Pane: Creating and saving R Scripts

Figure 11. The fourth pane (source pane) will appear when you create a new file called R Script (or load an existing R file).

Figure 11. The fourth pane (source pane) will appear when you create a new file called R Script (or load an existing R file).

For practice (live demo): Click File > New File > RScript. Within the file, type in one of the scripts you learned (e.g., one of the five basic arithmetic operators). Once you are done: Click File > Save As. Name the file however you want and save it within the location that you can remember. Close RStudio. And try double clicking the file from the location where you saved it.


Figure 12. What you will see once you have a saved loaded file and after running script within them.

Figure 12. What you will see once you have a saved loaded file and after running script within them.

Note: To run a script from an RScript file, click anywhere on line 1 (or highlight the code you want to run), and press the ‘Run’ button on the upper right corner of the source pane.


Source pane: Creating and saving R Scripts

A side note: Although I am teaching you how to create RScript (i.e., File > New File > RScript), we will be creating and using Quarto notebooks (more on this next week) throughout this semester.


Commenting your R Scripts/Code

Comments can be used to explain R code, and to make it more readable.

Comments starts with a #. When executing code, R will ignore anything that starts with #.

Figure 13. An example of a commented code in R. Important Note: The red 'Untitled1' implies this script is unsaved so make sure to always save your scripts.

Figure 13. An example of a commented code in R. Important Note: The red ‘Untitled1’ implies this script is unsaved so make sure to always save your scripts.


Data Types in R: A focus on data frame (tabular data)

There are data types in R that we will never use. Moreover, this is not a comprehensive programming course in R.

The one data type that we will commonly use and manipulate throughout the semester is called data frame. A data frame is a data structure constructed with rows and columns, similar to a nicely structured Excel spreadsheet or Google sheets. I may sometimes refer to this as tabular data.

Figure 14. An example of tabular data in Excel. When loaded into R, this will be read as a data frame.

Figure 14. An example of tabular data in Excel. When loaded into R, this will be read as a data frame.


Functions

What Is a Function in R? A function in R is one of the most used objects. It is an executable code that will perform certain tasks.

library() is an example of a function you were briefly introduced to in earlier slides. It is an R code that allows you to load a package. library(tidyverse) leverages the library function to load the tidyverse package into R.

The file we will open and go through for today’s hands-on practice will introduce you to other functions in R that works with data frames.


Functions: Some examples.

log() is an R function that takes logarithms of numbers you feed into it. Note: log() is technically ln()

# calculates ln(10)
log(10)
## [1] 2.302585


exp() is another R function that computes the exponential value of the number you feed into it. For example, exp(2) is equivalent to calculating \(e^{2}\).

# calculates exp(2) 
exp(2)
## [1] 7.389056


Functions: Base R.

Terminology: Functions such as exp() and log() are functions from what is called base R. Base R refers to built-in tools from the default installation of R.

The focus of this class, however, is the use of functions from packages to perform highly specialized tasks. Lectures 2-4 for example uses tidyverse functions for data visualization and manipulation.


Hands-on Practice 1

Download Exercise1.R from Canvas, then follow demo provided in class using your computer. Today’s exercise aims to give you practice on how to run scripts and give you more familiarity with RStudio.


Key Takeaways from Exercise1.R

This is a pattern we will generally use throughout the semester:


Next week: We will be creating basic data visualizations.


Homework will be posted at Canvas on January 17th (Friday). It will be due on January 24th (Friday) by 11:59pm.